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There are many software database management systems available 
on many general-purpose computers ranging from micros to super- 
mainframes with many distinct functionalities such as relational 
vs. hierarchical and text-retrieval vs. formatted-data-retr ieval- 
and-update. Do we really need a few special-purpose machines for 
database management? In particular, there is the Grosh's law 
which says as follows: 

"Whenever the capacity of a mainframe computer is saturated 
with the present work load, there is always another more powerful 
mainframe which can support the work load cost-effectively (with 
spare capacity)." 

For example, if a computer system such as IBM 3033 is 
saturated with the database management tasks and the database on 
the IBM 2314 disks has grown to its capacity, we can replace IBM 
3033 CPU with IBM 3081 CPU and IBM 2314 disks with IBM 3340 or 
3-80 disks. To this example, the Grosh's law may apply. However, 
the Grosh's law does not always work. Consider the next example. 
The presence of communications frontend computer can offload the 



communications work from the mainframe cost-effectively so that, 
instead of replacing the present mainframe with a more powerful 
model due to heavy communications, we can retain the mainframe 
longer. As it has turned out, the communications frontend pro- 
vides, in addition to cost-effectiveness, improved performance and 
new functionality (e.g., serving as gateways to networks). In 
other words, the Grosh's law does not work, only if the special- 
purpose computer, which offloads certain types of work from the 
mainframe, can provide lower cost, higher performance, and newer 
functional i ty . 



Database machines as backend computers can offload the data- 
base management work from the mainframe so that we can retain the 
same mainframe longer. However, the database backend must also 
demonstrate lower cost, higher performance, and newer functional- 
i ty . 



How to Keep the Cost Low? 
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ment functions and then realize these functions in the hardware 



directly. Instead of applying a single architectural principle to 
the entire machine such as applying pipelining principle to come 
up with a pipelining machine, we should apply architectural prin- 
ciples such as pipeling, concurrency and parallelism to various 
design levels of the database machine. From the viewpoint of 
transforming the software techniques for database management into 
hardware database machine components, we should not use unproven 
or seldom-practiced software techniques such as data pools [1]. 
Instead, we should transform proven software techniques such as 
indexing and clustering into the hardware. 

How to Keep the Performance High? 

Let us take a look at the following gross aggregates of a 
database management systems (DBMS): 




Ma i nf rame (s ) 



User (s) 



First Hypothesis (observation): No matter how great the amount of 
data is to be processed by the database processors, we can always 
process the data at the rate that they are being received. 
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Second Hypothesis: No matter how complex the DBMS software is, 
there is always a hardware architecture which can cause the execu- 
tion of the DBMS to be I/O-bound. 

The aforementioned two hypotheses, i.e., my observations, have 
important consequences. In order to articulate their importance, 
these consequences are expressed as corollaries. 

Corollary One: Building very fast and massive database processors 
is not a difficult task. 

Corollary Two: Supporting communications interfaces, pre- 
processing database transactions and executing DBMS software do 
not impact upon the performance of the machine. 

Corollary Three: The performance of the machine is proportional to 
the rate that the data can be moved in and out of the database 
stores. 

What these three corollaries are saying is that the perfor- 
mance of a database machine hinges on its 'I/O bandwidth' between 
the database stores and the database processors. The machine per- 
formance has little to do with the amount of processing that the 
database processors must perform, since we know how to build fast 
processors . 

How can we achieve a very high rate of data movement between 
the database stores and the database processors? 

Solution One: At the device level, we may, for example, use 
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parallel read-out-and-wr ite-in disks and simultaneous read-out- 
and-wr i te-i n drives. In other words, we may have parallel data 
streams coming out and going in an individual disk. In addition, 
we may have many such disks moving parallel data streams in and 
out of the drives simultaneously. 

Solution Two: At system level, we process the incoming or outgoing 
data streams separately and parallelly at the speed of data move- 
ment with minimal communications among the processors. 



In other words, we do not merge data streams before process- 
ing, since the merged stream would have to move faster and to be 
processed sooner. Also we do not attempt to increase the traffic 
in inter-processor communications and to rely on complex communi- 
cations networks. With these solutions, we have arrived at some 
important consequences which have impacts on the machine perfor- 
mance and architecture. They are listed below. 
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Consequence Three: Due to the previous consequences, it follows 
that multiple use of cheaper processors and local memories are 
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for sustaining high performance, that engineering changes 
disk's I/O bus structures and triggering mechanisms are 
(but no change to the read/write heads) and that a 
of the disk controller by incorporating multiple proces- 
their local memories for the multiple data buses is 



Third Hypothesis: The processing of meta da 
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Corollary Four: The meta data and raw data should have their 
separate stores and processors. Furthermore, their processing 
should be made concurrent with the processing of the raw data. 

Consequence Four: The design of meta data stores and processors 
and the design of raw data stores and processors may be different 
and specially tailored for achieving concurrent processing of both 
types of data. 



Database practitioners do appreciate the differences between 
meta data and raw data of a database. They also appreciate the 
different processing and storage requirements of these two types 
of data. They should be pleased that the architect of the future 
database machines takes these differences into design considera- 
tion. 



How to provide newer functionality? 
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Fourth Hypothesis: Presently, every DBMS is model-specific which 
implies language-specific and in turn it implies application- 
specif ic . 

By model-specific, we mean that the DBMS is based on a single 
data model. For example, the IBM IMS database system is based on 
hierarchical model. Consequently, the DL/1 is a hierarchical 
language and all the applications programs written for the IMS are 
in the DL/1 language. 

Solution Three: The new functionality of a future database machine 
lies in its capability in supporting multiple data models (there- 
fore, data languages and applications). 

Corollary Five: The future database machine looks like, for exam- 
ple, a relational machine to the relational database user, a 
hierarchical machine to the hierarchical database user, a Codasyl 
machine to the Codasyl database user, and a new machine to the new 
database user. 

Corollary Six: A single machine can support various model-specific 
databases and languages; or, there are many machines each of which 
can support a model-specific database and language. 

How do we go about designing and implementing a database 
machine which will support many models? 

Solution Four: Come up with a database kernel (or kernel machine) 
which takes care of all the access and update operations of the 
raw data and the meta data, which allows 'natural' mappings of 
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model-specific languages to the machine language of the kernel, 
and which provides a model-general database structure for various 
model-specific database organizations. 

Fifth Hypothesis: It is possible to come up with a low-cost and 
high-performance database kernel machine which takes care of all 
the data-intensive operations [2,3,4], 

Sixth Hypothesis: It is also possible to discover natural mappings 
of model-specific languages to the machine language of the kernel. 
These mappings are computation-intensive and are not data- 
intensive [5, 6, 7, 8]. 

Corollary Seven: The mapping software (i.e., the model-specific 
software interface) can be quickly executed by the database pro- 
cessors . 

Consequence Five: The support of multiple model-specific languages 
has little impact on the performance and cost of the database 
machine . 

Conclusions: On the basis of these hypotheses, corollaries, 
consequenc ies and solutions, the future of database machines is 
bright, since these are sound hypotheses, reasonable corollaries 
and good solutions. We believe that the future database machine 
can be cost-effective with high performance. It can also have new 
functionality . 
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